The advance of computer-aided detection systems using deep learning opened a new scope in endoscopic image analysis. However, the learning-based models developed on closed datasets are susceptible to unknown anomalies in complex clinical environments. In particular, the high false positive rate of polyp detection remains a major challenge in clinical practice. In this work, we release the FPPD-13 dataset, which provides a taxonomy and real-world cases of typical false positives during computer-aided polyp detection in real-world colonoscopy. We further propose a post-hoc module EndoBoost, which can be plugged into generic polyp detection models to filter out false positive predictions. This is realized by generative learning of the polyp manifold with normalizing flows and rejecting false positives through density estimation. Compared to supervised classification, this anomaly detection paradigm achieves better data efficiency and robustness in open-world settings. Extensive experiments demonstrate a promising false positive suppression in both retrospective and prospective validation. In addition, the released dataset can be used to perform 'stress' tests on established detection systems and encourages further research toward robust and reliable computer-aided endoscopic image analysis. The dataset and code will be publicly available at http://endoboost.miccai.cloud.
translated by 谷歌翻译
Although weakly-supervised techniques can reduce the labeling effort, it is unclear whether a saliency model trained with weakly-supervised data (e.g., point annotation) can achieve the equivalent performance of its fully-supervised version. This paper attempts to answer this unexplored question by proving a hypothesis: there is a point-labeled dataset where saliency models trained on it can achieve equivalent performance when trained on the densely annotated dataset. To prove this conjecture, we proposed a novel yet effective adversarial trajectory-ensemble active learning (ATAL). Our contributions are three-fold: 1) Our proposed adversarial attack triggering uncertainty can conquer the overconfidence of existing active learning methods and accurately locate these uncertain pixels. {2)} Our proposed trajectory-ensemble uncertainty estimation method maintains the advantages of the ensemble networks while significantly reducing the computational cost. {3)} Our proposed relationship-aware diversity sampling algorithm can conquer oversampling while boosting performance. Experimental results show that our ATAL can find such a point-labeled dataset, where a saliency model trained on it obtained $97\%$ -- $99\%$ performance of its fully-supervised version with only ten annotated points per image.
translated by 谷歌翻译
Depth cues are known to be useful for visual perception. However, direct measurement of depth is often impracticable. Fortunately, though, modern learning-based methods offer promising depth maps by inference in the wild. In this work, we adapt such depth inference models for object segmentation using the objects' ``pop-out'' prior in 3D. The ``pop-out'' is a simple composition prior that assumes objects reside on the background surface. Such compositional prior allows us to reason about objects in the 3D space. More specifically, we adapt the inferred depth maps such that objects can be localized using only 3D information. Such separation, however, requires knowledge about contact surface which we learn using the weak supervision of the segmentation mask. Our intermediate representation of contact surface, and thereby reasoning about objects purely in 3D, allows us to better transfer the depth knowledge into semantics. The proposed adaptation method uses only the depth model without needing the source data used for training, making the learning process efficient and practical. Our experiments on eight datasets of two challenging tasks, namely camouflaged object detection and salient object detection, consistently demonstrate the benefit of our method in terms of both performance and generalizability.
translated by 谷歌翻译
Automated identification of myocardial scar from late gadolinium enhancement cardiac magnetic resonance images (LGE-CMR) is limited by image noise and artifacts such as those related to motion and partial volume effect. This paper presents a novel joint deep learning (JDL) framework that improves such tasks by utilizing simultaneously learned myocardium segmentations to eliminate negative effects from non-region-of-interest areas. In contrast to previous approaches treating scar detection and myocardium segmentation as separate or parallel tasks, our proposed method introduces a message passing module where the information of myocardium segmentation is directly passed to guide scar detectors. This newly designed network will efficiently exploit joint information from the two related tasks and use all available sources of myocardium segmentation to benefit scar identification. We demonstrate the effectiveness of JDL on LGE-CMR images for automated left ventricular (LV) scar detection, with great potential to improve risk prediction in patients with both ischemic and non-ischemic heart disease and to improve response rates to cardiac resynchronization therapy (CRT) for heart failure patients. Experimental results show that our proposed approach outperforms multiple state-of-the-art methods, including commonly used two-step segmentation-classification networks, and multitask learning schemes where subtasks are indirectly interacted.
translated by 谷歌翻译
The selection of an optimal pacing site, which is ideally scar-free and late activated, is critical to the response of cardiac resynchronization therapy (CRT). Despite the success of current approaches formulating the detection of such late mechanical activation (LMA) regions as a problem of activation time regression, their accuracy remains unsatisfactory, particularly in cases where myocardial scar exists. To address this issue, this paper introduces a multi-task deep learning framework that simultaneously estimates LMA amount and classify the scar-free LMA regions based on cine displacement encoding with stimulated echoes (DENSE) magnetic resonance imaging (MRI). With a newly introduced auxiliary LMA region classification sub-network, our proposed model shows more robustness to the complex pattern cause by myocardial scar, significantly eliminates their negative effects in LMA detection, and in turn improves the performance of scar classification. To evaluate the effectiveness of our method, we tests our model on real cardiac MR images and compare the predicted LMA with the state-of-the-art approaches. It shows that our approach achieves substantially increased accuracy. In addition, we employ the gradient-weighted class activation mapping (Grad-CAM) to visualize the feature maps learned by all methods. Experimental results suggest that our proposed model better recognizes the LMA region pattern.
translated by 谷歌翻译
Automated detecting lung infections from computed tomography (CT) data plays an important role for combating COVID-19. However, there are still some challenges for developing AI system. 1) Most current COVID-19 infection segmentation methods mainly relied on 2D CT images, which lack 3D sequential constraint. 2) Existing 3D CT segmentation methods focus on single-scale representations, which do not achieve the multiple level receptive field sizes on 3D volume. 3) The emergent breaking out of COVID-19 makes it hard to annotate sufficient CT volumes for training deep model. To address these issues, we first build a multiple dimensional-attention convolutional neural network (MDA-CNN) to aggregate multi-scale information along different dimension of input feature maps and impose supervision on multiple predictions from different CNN layers. Second, we assign this MDA-CNN as a basic network into a novel dual multi-scale mean teacher network (DM${^2}$T-Net) for semi-supervised COVID-19 lung infection segmentation on CT volumes by leveraging unlabeled data and exploring the multi-scale information. Our DM${^2}$T-Net encourages multiple predictions at different CNN layers from the student and teacher networks to be consistent for computing a multi-scale consistency loss on unlabeled data, which is then added to the supervised loss on the labeled data from multiple predictions of MDA-CNN. Third, we collect two COVID-19 segmentation datasets to evaluate our method. The experimental results show that our network consistently outperforms the compared state-of-the-art methods.
translated by 谷歌翻译
电动汽车(EV)在自动启动的按需(AMOD)系统中起关键作用,但是它们的独特充电模式增加了AMOD系统中的模型不确定性(例如,状态过渡概率)。由于通常存在训练和测试(真)环境之间的不匹配,因此将模型不确定性纳入系统设计至关重要。但是,在现有文献重新平衡的EV AMOD系统中,尚未明确考虑模型不确定性,并且仍然是一项紧急和挑战的任务。在这项工作中,我们为EV重新平衡和充电问题设计了一个强大而有限的多机构增强学习(MARL)框架。然后,我们提出了一种强大且受限的MARL算法(Rocoma),该算法训练了强大的EV重新平衡政策,以平衡供需比率和整个城市的充电利用率在国家过渡不确定性下。实验表明,Rocoma可以学习有效且强大的重新平衡政策。当存在模型不确定性时,它的表现优于非稳定MAL方法。它使系统公平性增加了19.6%,并使重新平衡成本降低了75.8%。
translated by 谷歌翻译
随着面部伪造技术的快速发展,DeepFake视频在数字媒体上引起了广泛的关注。肇事者大量利用这些视频来传播虚假信息并发表误导性陈述。大多数现有的DeepFake检测方法主要集中于纹理特征,纹理特征可能会受到外部波动(例如照明和噪声)的影响。此外,基于面部地标的检测方法对外部变量更强大,但缺乏足够的细节。因此,如何在空间,时间和频域中有效地挖掘独特的特征,并将其与面部地标融合以进行伪造视频检测仍然是一个悬而未决的问题。为此,我们提出了一个基于多种模式的信息和面部地标的几何特征,提出了地标增强的多模式图神经网络(LEM-GNN)。具体而言,在框架级别上,我们设计了一种融合机制来挖掘空间和频域元素的联合表示,同时引入几何面部特征以增强模型的鲁棒性。在视频级别,我们首先将视频中的每个帧视为图中的节点,然后将时间信息编码到图表的边缘。然后,通过应用图形神经网络(GNN)的消息传递机制,将有效合并多模式特征,以获得视频伪造的全面表示。广泛的实验表明,我们的方法始终优于广泛使用的基准上的最先进(SOTA)。
translated by 谷歌翻译
本文提出了Salenet-端到端卷积神经网络(CNN),用于使用前额叶脑电图(EEG)进行持续注意水平评估。提出了一种偏置驱动的修剪方法,以及小组卷积,全局平均池(GAP),接近零的修剪,重量聚类和模型压缩的量化,达到183.11x的总压缩比。在这项工作中,压缩的分配器在记录的6个受试者EEG数据库上获得了最新的主题无关的持续注意力分类精度为84.2%。该沙发在ARTIX-7 FPGA上实施,竞争功耗为0.11 W,能源效率为8.19 GOPS/W。
translated by 谷歌翻译
心血管疾病是全球死亡的主要原因,是一种与年龄有关的疾病。了解衰老期间心脏的形态和功能变化是一个关键的科学问题,其答案将有助于我们定义心血管疾病的重要危险因素并监测疾病进展。在这项工作中,我们提出了一种新型的条件生成模型,以描述衰老过程中心脏3D解剖学的变化。提出的模型是灵活的,可以将多个临床因素(例如年龄,性别)整合到生成过程中。我们在心脏解剖学的大规模横截面数据集上训练该模型,并在横截面和纵向数据集上进行评估。该模型在预测衰老心脏的纵向演化和对其数据分布进行建模方面表现出了出色的表现。
translated by 谷歌翻译